Search CORE

39 research outputs found

Predefined Sparseness in Recurrent Sequence Models

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Godin Fréderic
Publication venue
Publication date: 01/01/2018
Field of study

Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.Comment: the SIGNLL Conference on Computational Natural Language Learning (CoNLL, 2018

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Author: Dambre Joni
De Neve Wesley
Degrave Jonas
Godin Fréderic
Publication venue: 'Elsevier BV'
Publication date: 31/10/2017
Field of study

In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to improve the current state-of-the-art in character-level language modeling over shallow architectures based on LSTMs

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Learning When Not to Answer: A Ternary Reward Structure for Reinforcement Learning based Question Answering

Author: Godin Fréderic
Kumar Anjishnu
Mittal Arpit
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, we investigate the challenges of using reinforcement learning agents for question-answering over knowledge graphs for real-world applications. We examine the performance metrics used by state-of-the-art systems and determine that they are inadequate for such settings. More specifically, they do not evaluate the systems correctly for situations when there is no answer available and thus agents optimized for these metrics are poor at modeling confidence. We introduce a simple new performance metric for evaluating question-answering agents that is more representative of practical usage conditions, and optimize for this metric by extending the binary reward structure used in prior work to a ternary reward structure which also rewards an agent for not answering a question rather than giving an incorrect answer. We show that this can drastically improve the precision of answered questions while only not answering a limited number of previously correctly answered questions. Employing a supervised learning strategy using depth-first-search paths to bootstrap the reinforcement learning algorithm further improves performance.Comment: Accepted at NAACL 2019. Version 1 was presented at NIPS 2018 workshop on Relational Representation Learnin

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Improving language modeling using densely connected recurrent neural networks

Author: Dambre Joni
De Neve Wesley
Godin Fréderic
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we introduce the novel concept of densely connected layers into recurrent neural networks. We evaluate our proposed architecture on the Penn Treebank language modeling task. We show that we can obtain similar perplexity scores with six times fewer parameters compared to a standard stacked 2-layer LSTM model trained with dropout (Zaremba et al. 2014). In contrast with the current usage of skip connections, we show that densely connecting only a few stacked layers with skip connections already yields significant perplexity reductions.Comment: Accepted at Workshop on Representation Learning, ACL201

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Part-of-speech tagging of Twitter microposts only using distributed word representations and a neural network

Author: De Neve Wesley
Godin Fréderic
Van de Walle Rik
Publication venue
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

A Simple Geometric Method for Cross-Lingual Linguistic Transformations with Pre-trained Autoencoders

Author: Buteneers Pieter
De Raedt Maarten
Demeester Thomas
Develder Chris
Godin Fréderic
Publication venue
Publication date: 01/01/2021
Field of study

Powerful sentence encoders trained for multiple languages are on the rise. These systems are capable of embedding a wide range of linguistic properties into vector representations. While explicit probing tasks can be used to verify the presence of specific linguistic properties, it is unclear whether the vector representations can be manipulated to indirectly steer such properties. We investigate the use of a geometric mapping in embedding space to transform linguistic properties, without any tuning of the pre-trained sentence encoder or decoder. We validate our approach on three linguistic properties using a pre-trained multilingual autoencoder and analyze the results in both monolingual and cross-lingual settings

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Explaining character-aware neural networks for word-level prediction : do they discover linguistic rules?

Author: Dambre Joni
De Neve Wesley
Demeester Thomas
Demuynck Kris
Godin Fréderic
Publication venue
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Author: Dambre Joni
De Neve Wesley
Demeester Thomas
Demuynck Kris
Godin Fréderic
Publication venue
Publication date: 01/01/2018
Field of study

Character-level features are currently used in different neural network-based natural language processing algorithms. However, little is known about the character-level patterns those models learn. Moreover, models are often compared only quantitatively while a qualitative analysis is missing. In this paper, we investigate which character-level patterns neural networks learn and if those patterns coincide with manually-defined word segmentations and annotations. To that end, we extend the contextual decomposition technique (Murdoch et al. 2018) to convolutional neural networks which allows us to compare convolutional neural networks and bidirectional long short-term memory networks. We evaluate and compare these models for the task of morphological tagging on three morphologically different languages and show that these models implicitly discover understandable linguistic rules. Our implementation can be found at https://github.com/FredericGodin/ContextualDecomposition-NLP .Comment: Accepted at EMNLP 201

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study

Author: Bitew Semere Kiros
De Raedt Maarten
Demeester Thomas
Develder Chris
Godin Fréderic
Publication venue
Publication date: 11/11/2023
Field of study

The brittleness of finetuned language model performance on out-of-distribution (OOD) test samples in unseen domains has been well-studied for English, yet is unexplored for multi-lingual models. Therefore, we study generalization to OOD test data specifically in zero-shot cross-lingual transfer settings, analyzing performance impacts of both language and domain shifts between train and test data. We further assess the effectiveness of counterfactually augmented data (CAD) in improving OOD generalization for the cross-lingual setting, since CAD has been shown to benefit in a monolingual English setting. Finally, we propose two new approaches for OOD generalization that avoid the costly annotation process associated with CAD, by exploiting the power of recent large language models (LLMs). We experiment with 3 multilingual models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and Restaurant reviews. Results echo the OOD performance decline observed in the monolingual English setting. Further, (i) counterfactuals from the original high-resource language do improve OOD generalization in the low-resource language, and (ii) our newly proposed cost-effective approaches reach similar or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.Comment: The 3rd Workshop on Multilingual Representation Learning (MRL@EMNLP2023

arXiv.org e-Print Archive

The normalized freebase distance

Author: Beecks Christian
De Neve Wesley
De Nies Tom
De Vocht Laurens
Godin Fréderic
Mannens Erik
Seidl Thomas
Van de Walle Rik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this paper, we propose the Normalized Freebase Distance (NFD), a new measure for determing semantic concept relatedness that is based on similar principles as the Normalized Web Distance (NWD). We illustrate that the NFD is more effective when comparing ambiguous concepts

Crossref

Ghent University Academic Bibliography